Categorical Data Visualization and Clustering Using Subjective Factors

نویسندگان

  • Chia-Hui Chang
  • Zhi-Kai Ding
چکیده

A common issue in cluster analysis is that there is no single correct answer to the number of clusters, since cluster analysis involves human subjective judgement. Interactive visualization is one of the methods where users can decide a proper clustering parameters. In this paper, a new clustering approach called CDCS (Categorical Data Clustering with Subjective factors) is introduced, where a visualization tool for clustered categorical data is developed such that the result of adjusting parameters is instantly reflected. The experiment shows that CDCS generates high quality clusters compared to other typical algo-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائه یک الگوریتم خوشه بندی برای داده های دسته ای با ترکیب معیارها

Clustering is one of the main techniques in data mining. Clustering is a process that classifies data set into groups. In clustering, the data in a cluster are the closest to each other and the data in two different clusters have the most difference. Clustering algorithms are divided into two categories according to the type of data: Clustering algorithms for numerical data and clustering algor...

متن کامل

Clustering High Dimensional Categorical Data via Topographical Features

Analysis of categorical data is a challenging task. In this paper, we propose to compute topographical features of high-dimensional categorical data. We propose an efficient algorithm to extract modes of the underlying distribution and their attractive basins. These topographical features provide a geometric view of the data and can be applied to visualization and clustering of real world chall...

متن کامل

Partitional Clustering of Malware Using K-Means

This paper describes a novel method aiming to cluster datasets containing malware behavioural data. Our method transform the data into an standardised data matrix that can be used in any clustering algorithm, finds the number of clusters in the data set and includes an optional visualization step for high-dimensional data using principal component analysis. Our clustering method deals well with...

متن کامل

Simultaneous Topological Categorical Data Clustering and Cluster Characterization

In this paper we propose a new automatic learning model which allows the simultaneously topological clustering and feature selection for quantitative datasets. We explore a new topological organization algorithm for categorical data clustering and visualization named RTC (Relational Topological Clustering). Generally, it is more difficult to perform clustering on categorical data than on numeri...

متن کامل

Rough Set based Rule Induction Package for R

Rough set theory is a framework of dealing with uncertainty based on computation of equivalence relations/clases. Since a proability is defined as a measure of sample space, defined by equivalence classes, rough sets are closely related with probabilities in the deep level of mathematics. Furthermore, since rough sets are closely related with Demster-Shafer theory or fuzzy sets, this theory can...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004